Introduction
Web data
Methods
Results
Conclusions
Because new digital activities are rarely—if ever—captured in official state data, researchers must rely on information gathered from alternative sources (Zook and McCanless 2022).
Guide policies for deployment of new technologies
Predictions of introduction times for future technologies (Meade and Islam 2021):
Network operators
Suppliers of network equipment
Regulatory authorities
As in temporal diffusion models, an S-shaped pattern in the cumulative level of adoption
A hierarchy effect: from main centres to secondary ones – central places
A neighborhood effect: diffusion proceeds outwards from innovation centres, first “hitting” nearby rather than far-away locations (Grubler 1990)
Hägerstrand (1965): from innovative centres (core) through a hierarchy of sub-centres, to the periphery
Diffusion of an intangible/digital technology [web]
Map the active engagement with the digital
Over time, early stages of the internet [1996-2012]
Granular and multi-scale spatial perspective
Data from the Internet Archive, the oldest web archive
Observe commercial websites 1996 - 2012 in the UK (.co.uk)
Geolocation: postcode references in the text
Timestamp: archival year
Counts
Data from a Web Archive – The Internet Archive
Observe commercial websites 1996 - 2012 in the UK (.co.uk)
Geolocation: postcode references in the text
Timestamp: archival year
Counts
JISC UK Web Domain Dataset: all archived webpages from the .uk domain 1996-2012
Curated by the British Library
All .uk archived webpages which contain a UK postcode in the web text
Circa 0.5 billion URLs with valid UK postcodes
20080509162138 | http://www.website1.co.uk/contact_us | IG8 8HD
All the archived .uk webpages
Archived during 1996-2012
Commercial webpages (.co.uk)
From webpages to websites:
- http://www.website1.co.uk/webpage1 and
- http://www.website1.co.uk/webpage2 are part of the
1 vs. multuple postcodes in a website
| level | freq | perc | cumfreq | cumperc |
|---|---|---|---|---|
| (0,1] | 41,596 | 0.718 | 41,596 | 0.718 |
| (1,2] | 6,451 | 0.111 | 48,047 | 0.830 |
| (2,10] | 6,163 | 0.106 | 54,210 | 0.936 |
| (10,100] | 2,975 | 0.051 | 57,185 | 0.988 |
| (100,1000] | 646 | 0.011 | 57,831 | 0.999 |
| (1000,10000] | 62 | 0.001 | 57,893 | 1.000 |
| (10000,100000] | 4 | 0.000 | 57,897 | 1.000 |
Websites with a large number of postcodes: e.g. directories, real estate websites
Focus on websites with one unique postcode per year
S-shaped pattern in the cumulative level of adoption
A hierarchy effect: from main centres to secondary ones
A neighborhood effect: first “hitting” nearby locations
Cumulative adoption: Self-starting logistic growth model
[nls and SSlogis]
Descriptive statistics, ESDA & density regressions
Machine learning framework [random forests]
Two scales:
Spatial heterogeneity
Not a clear, easy to explain pattern
Adoption heterogeneity
Different perceptions of risk and economic returns from new technologies
Early adopters vs. laggards, leapfrogging
Spatial heterogeneity
Expected volatility
Neighbourhood effect: diffusion proceeds outwards from innovation centers, first “hitting” nearby rather than far-away locations (Grubler 1990)
Websites per firm in Local authorities (c. 400)
Websites in Output Areas (c. 200,000)
Hierarchy effect: from main centers to secondary ones – central places
Almost perfect polarisation of web adoption in the early stages at a granular level
Polarisation decreases over time
More equally diffused at the Local Authority level
A hierarchy effect: from main centres to secondary ones
A neighborhood effect: first “hitting” nearby locations
S-shaped pattern in the cumulative level of adoption
\[Website\,Density_{t} \sim
\color{orange}{Distance\,London} +
\color{orange}{Distance\,Nearest\,City} +
\color{orange}{Distance\,Nearest\,Retail_{i}} +\\
\color{orange}{Website\,density\,London_{t-1}} +
\color{orange}{Website\,density\,Nearest\,City_{t-1}} +
\color{orange}{Website\,density\,Nearest\,Retail_{t-1}} +\\
\color{olive}{W*\, Website\,density_{t-1}} +\\
\color{violet}{year_{t}}\]
Random forests to predict \(Website\,Density_{i,t}\)
2 spatial resolutions:
2 sets of models:
Space-time sensitive 10-fold CV (CAST)
| RMSE | RSquared | MAE | |
|---|---|---|---|
| Local Authorities | 0.032 | 0.810 | 0.019 |
| Output Areas | 5.000 | 0.205 | 1.047 |
No retail centre variables for the Local Authority
models
| Region | Rsquared |
|---|---|
| Nortern Ireland | 0.576 |
| North West | 0.664 |
| Scotland | 0.770 |
| London | 0.805 |
| South West | 0.864 |
| East of England | 0.876 |
| East Midlands | 0.882 |
| West Midlands | 0.883 |
| North East | 0.895 |
| Yorkshire and The Humber | 0.906 |
| Wales | 0.916 |
| South East | 0.947 |
\[Website\,Density\,Growth_{t} \sim
\color{orange}{Distance\,London} +
\color{orange}{Distance\,Nearest\,City} +
\color{orange}{Distance\,Nearest\,Retail_{i}} +\\
\color{orange}{Website\,density\,London_{t-1}} +
\color{orange}{Website\,density\,Nearest\,City_{t-1}} +
\color{orange}{Website\,density\,Nearest\,Retail_{t-1}} +\\
\color{olive}{W*\, Website\,density_{t-1}} +\\
\color{violet}{year_{t}}\]
| RMSE | RSquared | MAE | |
|---|---|---|---|
| Local Authorities | 0.200 | 0.634 | 0.148 |
No retail centre variables for the Local Authority
models
| Region | Rsquared |
|---|---|
| Nortern Ireland | 0.473 |
| Scotland | 0.697 |
| North West | 0.768 |
| South West | 0.833 |
| London | 0.852 |
| North East | 0.866 |
| East of England | 0.885 |
| Yorkshire and The Humber | 0.885 |
| East Midlands | 0.885 |
| Wales | 0.892 |
| West Midlands | 0.903 |
| South East | 0.915 |
Established technological diffusion drivers still apply
Geography matters: spatial dependency, urban gravitation
Some indications of a hierarchical diffusion
Granular analysis reveals patterns otherwise not visible
Stability and volatility: leapfrogging, early adopters dropping, but also stable positions
Spatially consistent mechanisms at local scale
Heterogeneity increases with resolution
Things to do: OA?